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Abstract 

In compressed sensing, the restricted isometry property (RIP) is a sufficient condition for the 
efficient reconstruction of a nearly fc-sparse vector x G C d from m linear measurements $x. It 
is desirable for m to be small, and for $ to support fast matrix-vector multiplication. In this 
work, we give a randomized construction of RIP matrices $ E C mxd , preserving the £2 norms of 
all fc-sparse vectors with distortion 1 +s, where the matrix- vector multiply $2; can be computed 
in nearly linear time. The number of rows m is on the order of e~ 2 k log d log 2 (k log d). Previous 
analyses of constructions of RIP matrices supporting fast matrix-vector multiplies, such as the 
sampled discrete Fourier matrix, required m to be larger by roughly a log k factor. 

Supporting fast matrix- vector multiplication is useful for iterative recovery algorithms which 
repeatedly multiply by $ or Furthermore, our construction, together with a connection 
between RIP matrices and the Johnson-Lindcnstrauss lemma in [Krahmcr-Ward, SIAM. J. 
Math. Anal. 2011], implies fast Johnson-Lindcnstrauss embeddings with asymptotically fewer 
rows than previously known. 

Our approach is a simple twist on previous constructions. Rather than choosing the rows 
for the embedding matrix to be rows sampled from some larger structured matrix (such as the 
discrete Fourier transform or a random circulant matrix), we instead choose each row of the 
embedding matrix to be a linear combination of a small number of rows of the original matrix, 
with random sign flips as coefficients. The main tool in our analysis is a recent bound for 
the suprcmum of certain types of Radcmacher chaos processes in [Krahmer-Mendclson-Rauhut, 
arXiv abs/1207.0235]. 



1 Introduction 

The goal of compressed sensing |12y24| is to efficiently reconstruct sparse, high-dimensional signals 
from a small set of linear measurements. We say that a x G C d is k-sparse if ||x||o < k, where 
||x||o denotes the number of non-zero entries. The idea is that if x is guaranteed to be sparse or 
nearly sparse (that is, close to a sparse vector), then we should be able to recover it with far fewer 
than d measurements. Organizing the measurements as the rows of a matrix <]? G C mxd , one wants 
an efficient algorithm 1Z which approximately recovers a signal x G C d from the measurements <J?x; 
that is, ||7£.(<I>x) — x\\2 should be small. There are several goals in the design of $ and 7Z. We would 
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like m <C d to be as small as possible, so that $x can be interpreted as a compression of x. We also 
ask that the recovery algorithm 1Z be efficient, and satisfy a reasonable recovery guarantee when x 
is close to a sparse vector. 

The recovery guarantee most popular in the literature is the £2 / £\ guarantee, which compares the 
error between x and the recovery lZ(<&x) to the error between x and the best fc-sparse approximation 
of x. More precisely, to satisfy the £2/^1 guarantee there must exist a constant C such that for 
every x, TZ(&x) satisfies 

\\n($x) - x\\ 2 < • inf \\x-y\\i. (1) 
\Jk y ec d 
\\y\\o<k 

The value of m and the pair §,1Z can depend on d and k. Above, || • || p denotes the £ p norm 
= (Si an d ll x llo denotes the number of non-zero entries of x. 

In this work, we will be concerned with a sufficient condition for the ^2/^1 guarantee, known 
as the (e, 2k) restricted isometry property, or (e, 2k)-RIP. We say that a matrix $ £ c mxrf has the 
(e, fc)-RIP if 

Vx e C d , \\x\\ <k=>{l- e)\\x\\l < \\$x\\l < (1 + e)||z|||. (2) 

It is known that if <3? satisfies the (e, /c)-RIP for e < \[2 — 1, then enables the £2/^1 guarantee for 
some constant C \11 \ \1?> \ WQ. Furthermore, this guarantee is achievable by efficient methods such 
as solving a linear program [13][T71[25] . 

In this work, we construct matrices <3? which satisfy the RIP with few rows, and which addition- 
ally support fast matrix-vector multiplication. The speed of the encoding time is important not 
just for encoding x as but also for the reconstruction of x. Aside from linear programming, 
there are several iterative algorithms for recovering x from <I>x when $ satisfies the RIP: for exam- 
ple Iterative Hard Thresholding [8], Gradient Descent with Sparsification [29], CoSaMP [33], Hard 
Thresholding Pursuit [27], Orthogonal Matching Pursuit [54], Stagewise OMP (StOMP) [26], and 
Regularized OMP (ROMP) [351146]. All these algorithms have running times essentially bounded by 
the number of iterations (which is usually logarithmic in d and an error parameter) times the run- 
ning time required to perform a matrix- vector multiply with either <3> or and so it is important 
that this operation be fast. 

If we do not require fast matrix-vector multiplication, it is known that RIP matrices exist 
with m = @(k\og(d/k)). For example, any matrix with i.i.d. Gaussian or subgaussian entries 
suffices [71[T5,42j. This is known to be optimal even for the £2/^1 recovery problem itself via 
a connection to Gelfand widths [301 [37] ( see a discussion in [7J Section 3]), and is even required 
to obtain a weaker randomized guarantee [23]. However, for such matrices, naive matrix- vector 
multiplication requires time 0(dm). Ideally, for the applications above, this would instead be 
nearly linear in d. This has caused a search for RIP matrices that support fast matrix-vector 
multiplication, leading to constructions that unfortunately require m to be larger than the optimal 
by several logarithmic factors. We discuss previous work in closing this gap, and our contribution, 
in more detail in Section 11.21 below. 

1.1 Johnson-Lindenstrauss 

The Johnson-Lindenstrauss (JL) lemma of [34] is related to the RIP, and, as we will see below, our 
constructions of RIP matrices will imply constructions of Johnson-Lindenstrauss transforms with 
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fast embedding time. The JL lemma states that there is a way to embed N points in 1% into a 
linear subspace of dimension approximately log Af, with very little distortion. 

Lemma 1. For any < e < 1/2 and any x±, . . . , £ there exists a linear map A £ jj mxci j or 
m = 0(e~ 2 log iV) suc/i that for all 1 < 1 < j < N, 

(1 - e)\\xi - Xj\\ 2 < \\Axi - Axj\\ 2 < (1 + e)\\xi - Xj\\ 2 . 

For any fixed set of vectors xi, ■ ■ ■ , x/v, we call a matrix A as in the lemma an e-JL matrix for 
that set. It is known that there are sets of N vectors for which m = f2((e~ 2 / log(l/e)) log N) is 
required [5]. In fact, this bound holds for any, not necessarily linear, embedding into I™- 

The JL lemma is a useful tool for speeding up solutions to several problems in high-dimensional 
computational geometry; see for example [32|J55| . Often, one has an algorithm which is fast in terms 
of the number of points but slow as a function of dimension: a good strategy to approximate a solu- 
tion quickly is to first reduce the input dimension via the JL lemma before running the algorithm. 
Recently dimensionality reduction via linear maps has also found applications in approximate nu- 
merical algebra problems such as linear regression and low-rank approximation [19. 20 . 43 , 47|[52] . 
and for the /c-means clustering problem [9]. Going back to our original problem, the JL lemma also 
implies the existence of (e, /c)-RIP matrices with 0{e~ 2 k\og(d/k)) rows (7j. 

Due to its algorithmic importance, it is of interest to obtain JL matrices which allow for fast 
embedding time, i.e. for which the matrix-vector product Ax can be computed quickly. Paralleling 
the situation with the RIP, if we do not require that A support fast matrix-vector multiplication, 
there are many constructions of dense matrices A which are JL matrices with high probability 
[TJ[6l[22j[28, 33, 34 ,,41]. For example, we may take A to have i.i.d. Gaussian or subgaussian entries. 
However, for such A matrix-vector multiplication takes time 0(dm), where as before we would like 
it to be nearly linear in d. As with the RIP, if we require this embedding time, there is gap of 
several logarithmic factors between the upper and lower bounds on the target dimension m. We 
review previous work and state our contributions on this gap below. 

1.2 Previous Work on Fast RIP/JL, and Our Contribution 

Above, we saw the importance of constructing RIP and JL matrices which not only have few rows 
but also support fast matrix- vector multiplication. Below, we review previous work in this direction. 
We then state our contributions and improvements, which are summarized in Figure [TJ 

The best known construction of RIP matrices with fast multiplication come from either subsam- 
pled Fourier matrices (or related constructions) or from partial circulant matrices. Candes and Tao 
showed in [15] that a matrix whose rows are m = 0(k log 6 d) random rows from the Fourier matrix 
satisfies the (0(1), fc)-RIP with positive probability. The analysis of Rudelson and Vershynin [51] 
and an optimization of it by Cheraghchi, Guruswami, and Velingker [18] improved the number of 
rows required for the (e, /c)-RIP to m = 0(e~ 2 k log <ilog 3 k). For circulant matrices, initial works 
required m 3> A; 3 / 2 to obtain the (e, k)-KIP f31~|l50]: Krahmer, Mendelson and Rauhut [38] recently 
improved the number of rows required to m = 0(e~ 2 k log 2 dlog 2 k). 

The first work on JL matrices with fast multiplication was by Ailon and Chazelle [2], which 
had m = 0(e~ 2 log N) rows and embedding time 0(dlogd + m 3 ). In certain applications can 

x The JL lemma is most commonly stated over TSL, so we state it this way here. However, as in [39] . all of our 
results extend to complex vectors and complex matrices. 
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be exponentially large in a parameter of interest, e.g. when one wants to preserve the geometry 
of an entire subspace for numerical linear algebra [19U52] or fc-means clustering [9], or the set of 
all sparse vectors in compressed sensing [7J. Thus, while the number of rows in this construction 
is optimal, for some applications it is important to improve the dependence on m in the running 
time. Ailon and Liberty [3] improved the running time to 0(d log m + m 2+7 ) for any desired 7 > 
(with the same number of rows), and more recently the same authors gave a construction with 
m = 0(e _4 log iVlog 4 d) supporting matrix- vector multiplies in time O(dlogd) [3]. Krahmer and 
Ward [39] improved the target dimension to m = 0(e~ 2 log ./V log 4 d). 

This last improvement of [39] is actually a more general result. Specifically, they showed that, 
when the columns are multiplied by independent random signs, any (0(e), O (log iV))-RIP matrix 
becomes an e-JL matrix for a fixed set of TV vectors with probability 1 — jV^^ 1 ** . Since we saw above 
that sampling 0(e~ 2 k log <ilog 3 k) rows from the discrete Fourier or Hadamard matrix satisfies 
(e, fc)-RIP with constant probability, conditioning on this event and applying the result of [39] 
implies a JL matrix with m = 0(e~ 2 log iV log a! log 3 (log N)) = 0(e~ 2 log iVlog 4 d) and embedding 
time 0(d\ogd). We will use the same method to obtain fast JL matrices from our constructions of 
RIP matrices. 

Another way to obtain JL matrices which support fast matrix-vector multiplication is to con- 
struct sparse JL matrices [10 |,[2T1 35,36,56]. These constructions allow for very fast multiplication 
Ax when the vector x is itself sparse. However, these constructions have an 0(e) fraction of nonzero 
entries, and it is known that any JL transform with 0(E _2 logiV) rows requires an Q(e/log(l/e)) 
fraction of nonzero entries [48]. Thus, for constant e and dense x, multiplication still requires time 
9 (dm). 

In this work we propose and analyze a new method for constructing RIP matrices that support 
fast matrix- vector multiplication. Loosely speaking, our method takes any "good" ensemble of RIP 
matrices, and produces an ensemble of RIP matrices with fewer rows by multiplying by a suitable 
hash matrix. We can apply our method to either subsampled Fourier matrices or partial circulant 
matrices to obtain our improved RIP matrices. 

Our construction follows a natural intuition. For example, let A be the discrete Fourier matrix, 
and suppose that S is an m x d matrix with i.i.d. Rademacher entries, appropriately normalized. 
If m = @(e~ 2 klog(d/k)), then SA satisfies the (e, fc)-RIP with high probability, because S has the 
RIP, and A is an isometry. Unfortunately, this construction has slow matrix-vector multiplication 
time. On the other hand, if S' is an extremely sparse random sign matrix, with only one non-zero 
per row, then S'A is a subsampled Fourier matrix, supporting fast multiplication. Unfortunately, in 
order to show that S'A satisfies the RIP with high probability, m must be increased by polylog(fc) 
factors. This raises the question: can we get the best of both worlds? How sparse must the sign 
matrix S be to ensure RIP with few rows, and can it be sparse enough to maintain fast matrix- 
vector multiplication? In some sense, this question, and our results, connects the two lines of 
research — structured matrices and sparse matrices — on fast JL matrices mentioned above. Our 
results imply we can improve the number of rows over previous work by using such a sparse sign 
matrix with only polylog(d) non-zeroes per row. 

Our Main Contribution: We give randomized constructions of (e, A:)-RIP matrices with m = 
0(e~ 2 k\ogdlog 2 (k\ogd)) and which support matrix-vector multiplication in time O(dlogd) + 
m ■ log ^ d. When combined with [39], we obtain a JL matrix with a number of rows m = 
0(e~ 2 logNlogdlog 2 ((logN)logd)) = 0(e~ 2 log iVlog 3 d) and same embedding time. Thus for 
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Ensemble 


# rows m needed for RIP 


Matrix-vector 
multiplication time 


Restrictions 


Reference 


Partial Fourier 


0(£- 2 fclogdlog 3 k) 


O(rflogd) 




[S1ET] 


Partial Circulant 


0(e~ 2 k log 2 d log 2 k) 


O(dlogm) 




[38] 


Hash x 

Partial Fourier 


0(e- 2 fclogdlog 2 (fclogd)) 


0(d log d) + m polylog d 


k > log 2 5 m 


this work 


Hash x 

Partial Circulant 


0(£~ 2 fclogdlog 2 (fclogd)) 


0(d log m) + m polylog d 


k > log 2 m 


this work 



Figure 1: Table of results. 



both RIP and JL, our constructions support fast matrix-vector multiply using the fewest rows 
known. 

Our RIP and JL matrices maintain the 0{d\ogd) running time of the sampled discrete Fourier 
matrix as long as k < dj polylog d, and never have multiplication time larger than d • log *- 1 -* d even 
for k as large as d. Our results are given in Figured! 

We remark that the restrictions k > polylog m in Figure [1] can be eliminated as long as e is 
not too small, because in this case it is already known how to obtain optimal RIP matrices with 
fast multiplication for small k. More precisely, the Fast Johnson-Lindenstrauss Transform of [2], 
combined with [7], give an (e, /c)-RIP matrix with m = 0(e~ 2 k\og(d/k)) rows that supports matrix- 
vector multiplies in time 0{d\ogd) as long a k < e 2 ^d 1 ^/ polylog d. Meanwhile, our restrictions 
in Figure Q] require k > polylog m. Thus, the only case when neither our result nor the results 
of [2j[7] applies occurs when e < (polylog d)/\fd. We note that when e < 1/Vd, it is unknown 
how to obtain any (e, /c)-RIP matrix with fewer than d < 1/e 2 rows, and this is already trivially 
obtained by the identity matrix. 

1.3 Notation and Preliminaries 

We set some notation. We use [n] to denote the set {1, . . . , n}. We use ||-|| 2 denote the I2 norm 
of a vector, and ||-||, ||-|| F to denote the operator and Frobenius norms of a matrix, respectively. 
For a set S and a norm (S) denotes the diameter of S with respect to \\-\\x- The set 

of /c-sparse vectors i 6 C d with ||x|| 2 < 1 is denoted T^. In addition to O(-) notation, for two 
functions f,g, we use the shorthand / < g (resp. >) to indicate that / < Cg (resp. >) for some 
absolute constant C. We use / ~ g to mean cf < g < C f for some constants c, C. For clarity, we 
have made no attempt to optimize the values of the constants in our analyses. 

Once we define the randomized construction of our RIP matrix we will control 1 1| ||| — ll^llll 
uniformly over and thus will need some tools for controlling the supremum of a stochastic process 
on a compact set. For a metric space (T,d), the 5-covering number Af(T,d, 5) is the size of the 
smallest 5-net of T with respect to the metric d. One way to control a stochastic process on T is 
simply to union bound over a sufficiently fine net of T; a more powerful way to control stochastic 
processes, due to Talagrand, is through the 72 functional |53| . 

Definition 2. For a metric space (T,d), an admissible sequence of T is a sequence of nets 
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A 1 ,A 2 , ... ofTso that \ A n \ < 2 



2 



Then 



oo 



j 2 (T,d) := infsup V2 n/2 (i(i,t) 
*e T n=l 



where the infimum is taken over all admissible sequences {A n }. 

Intuitively, 72 d) measures how "clustered" T is with respect to d: if T is very clustered, then 
the union bound over nets above can be improved by a chaining argument. A similar idea is used 
in Dudley's integral inequality [40^ Theorem 11.1], and indeed they are related (see [53], Section 



It is this latter form that will be useful to us. 
1.4 Organization 

In Section [2] we define our construction and give an overview of our techniques. We also state 
our most general theorem, Theorem [61 which gives a recipe for turning a "good" ensemble of RIP 
matrices into an ensemble of RIP matrices with fewer rows. In Section [3j we apply Theorem [6] to 
obtain the results listed in Figure CD Finally, we prove Theorem [6] in Sections H] and [5j 

2 Technical Overview 

Our construction is actually a general method for turning any "good" RIP matrix with a suboptimal 
number of rows into an RIP matrix with fewer rows. Many previous constructions of RIP matrices 
involve beginning with an appropriately structured matrix (a DFT or Hadamard matrix, or a 
circulant matrix, for example), and keeping only a subset of the rows. In this work we propose a 
simple twist on this idea: each row of our new matrix is a linear combination of a small number of 
rows from the original matrix, with random sign flips as the coefficients. Formally, we define our 
construction as follows. 

Let Am be a distribution on M X d matrices, defined for all M, and fix parameters m and B. 
Define the injective function h : [m] x [B] — > [mB] as h(b,i) = B(b — 1) + i to partition [mB] into 
m buckets of size B, so h(b, i) denotes the i th element in bucket b. We draw a matrix A from A m B, 
and then construct our m x d matrix &(A) by using h to hash the rows of A into m buckets of size 



Definition 3 (Our construction). Let Am be as above, and fix parameters m and B. Define a new 
distribution on m x d matrices by constructing a matrix E £. mxd as follows. 

1. Draw A ~ A m B, a nd let cti denote the rows of A. 

2. For each (b,i) G [m] x [B], choose a sign o~bi 6 {±1} independently, uniformly at random. 

3. For b = l,...,m let 



1.2) by 




(3) 



B. 




ie[B] 



and let $ = &(A, a) be the matrix with rows (ff,. 



6 




We use Ab to denote the B x d matrix with rows a^j,^ for i G [B] . 

Equivalently, <& may be obtained by writing $ = HA, where A ~ A m B, and H is the m x mB 
random matrix with columns indexed by (b,i) G [m] x [B], so that 

H j,(b,i) = 

Note that there are two sources of randomness in the construction of <£: there is the choice of 
A ~ A m B, and also the choice of the sign flips which determine the matrix H. Our RIP matrix 
will be the appropriately normalized matrix $ / V mB. 

We consider two example distributions for Am- First, we consider a bounded orthogonal en- 
semble. 

Definition 4 (Bounded orthogonal ensembles). Let U G t£ dxd 5 e an y unitary matrix with \Uij\ < 1 
for all entries Uij of U. Let Ui denote the i th row of U . A matrix A G c A ^ xd i s drawn from the 
bounded orthogonal ensemble associated with U as follows. Select, independently and uniformly at 
random, a multi-set Q, = {t\,. . . ,tu} with U G [d]. Then let A G C Mxd be the matrix with rows 
Utj , • • • , u tM . 

Popular choices (and our choices) for U include the d-dimensional discrete Fourier transform 
(resulting in the Fourier ensemble), or the dxd Hadamard matrix, both of which support 0(dlog d) 
time matrix-vector multiplication. 

The second family we consider is the partial circulant ensemble. 

Definition 5 (Partial Circulant Ensemble). For z G C^, the circulant matrix H z G <C dxd is given 
by H z x = z * x, where * denotes convolution. Fix 17 C [d] of size M arbitrarily. A matrix A is 
drawn from the partial circulant ensemble as follows. Choose e G {±1}^ uniformly at random, and 
let A be the rows of H e indexed by Q . 

As long as the original matrix ensemble A supports fast matrix-vector multiplication, so does 
the resulting matrix Indeed, writing $x = HAx as above, we observe that there are mB nonzero 
entries in H, so computing the product HAx takes time 0(mB), plus the time it takes to compute 
Ax. When A is drawn from the partial Fourier ensemble, Ax may be computed in time O(dlogd) 
via the fast Fourier transform. We will choose B = polylog(d), and so $x may be computed in 
time 0{d\ogd + mpolylogd). When A is the partial circulant ensemble, Ax may be computed in 
time d\og{mB) by breaking it up into d/{mB) blocks, each of which is a mB x mB Toeplitz matrix 
supporting matrix-vector multiplication in time 0{mB\og(mB)). Thus, in this case $x may be 
computed in time 0{d\og{mB) + mB) = 0{d\ogm) + m polylog d. 

Having established the "multiplication time" column of Figure [lj we turn to the more difficult 
task of establishing the bounds on m, the number of rows. We note that <&/\JmB has the (e, /c)-RIP 
if and only if 



sup 



1 II* l|2 || 1 1 2 

^l|fe|| a -N 2 



We will show that 



and so our goal will be to establish bounds on sup^g^ 
if A satisfies certain properties, then in expectation this quantity is small. Specifically we require 



|<3?x||2 / (mB) — \\x\\2 
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the following two conditions. First, we require a random matrix from A to have the RIP with a 
reasonable, though perhaps suboptimal, number of rows: 



E sup 

A.~A xeTk 



m\\M 2 



(*) 



for some quantity L, for suitably large M > Mq. 

Second, the matrices Ab whose rows are the rows of A indexed by h(b, i) for i G [B] should be 
well-behaved. Define ([[]) to be the event that 



max sup ||Af,j;|| 2 < £(s) 



(T) 



for some function £(s) and all s < 2k. We require that ([[]) happen with constant probability: 

[© holds] > 7/8. (**) 

for some sufficiently small function £. 

As long as these two requirements on A are satisfied, and all matrices in the support of A 
have entries of bounded magnitude, the construction of Definition [3] yields a RIP matrix, with 
appropriate parameters. The following is our most general theorem. 

Theorem 6. Fix e G (0, 1), and fix integers m and B. Let A = A m B be a distribution on mB x d 
matrices so that {{(liW^ < 1 almost surely for all rows en of A ~ A. Suppose that (jij holds with 

L < mBe 2 , 

and M = mB > Mq. Suppose further that (juj) holds, with 

£(s) < Qiv A B + Q 2 v / ^ 

and that 

B > maxjQ 2 , log 2 m, Q\ log mlog k}, and k > Qllog 2 m. 



Finally, suppose that m > mo, for 



m 



k log d log 2 (B k) 



Let <I> be drawn from the distribution of Definition^ Then 

1 



sup 

x&T k 



—5 \\^ x \\2 

mB z 



that is, -t=j<I> satisfies the (0(e),k)-RIP, with 3/4 probability. 

In Section [3l we will show how to use Theorem [6] to prove the results reported in Figure [TJ but 
first we will outline the intuition of the proof of Theorem [BJ 

By construction, the expectation of ||$x||2 over the sign flips o is simply H^lxll 2 ,, and Q guaran- 
tees that this expectation is under control, uniformly over x G T^. The trick is that A has mB rows, 
rather than m, and this provides slack to handle the fact that the guarantee Q is not optimal. 
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1 1 2 

The problem is then to argue that for all x G Tj-, ||$x|| 2 is close to its expectation. The proof 



of Theorem [6] proceeds in two steps. First, we condition on A and control the deviation 



E sup 

a xGT k 



\$x\\l -E||$x||2 



(4) 



Second, we take the expectation with respect to A ~ A m B- 

In Theorem Qj] we carry out the first step and bound the deviation @ by Talagrand's 72 
functional 72(?fe, ||-|| x ), where ||x||y := max;, ||^4.{,o; || 2 is a norm which measures the contribution 

1 1 1 1 2 

to H&clU of the worst bucket b of the partition function h. Our strategy is to write ||$x|| 2 as 
||X(x)cr||2, for an appropriate matrix X(x) that depends on A. Finally we use a result of Krahmer, 
Mendelson, and Rauhut [38] to control the Rademacher chaos, obtaining an expression in terms of 

l2(T k , IHIx)- 

In the second step, we unfix A, and 72(7fc, ||-||^) becomes a random variable. In Theorem 1121 
we show that, as long as holds, 72(?fc, ||-||^) is small with high probability over the choice of 
A ~ AmB- By ([3j), it is sufficient to bound the covering numbers M(Tk, \\-\\x > M )- This is similar 
to [51], which must bound the same A^(Tfc, ||-||^ , u) but in a setting where B = 1. Both papers 
use Maurey's empirical method to relate the covering number to E[max& H-A&gl^] for a Gaussian 
process g. But while [51] loses a y/log m factor in a union bound over b, we only lose a constant 
factor as long as B > polylogd This difference is what gives our log/c improvement in m. It is 
also the most technical piece of our proof, and is presented in Section [5j 

Finally, we put all of the pieces together. As long as mB is large enough and the condition 
Q holds, Efj 1 1 1 1 2 j\fmB will be close to 1 1 a? 1 1 2 in expectation over A. At the same time as long 
as the condition (fcTj) holds, the deviation ([4]) is small in expectation over A ~ A m B- Choosing B 
appropriately controls the restricted isometry constant of <3?, at the cost of slightly increasing the 
embedding time. 



3 Main Results 

Before we prove Theorem [6l let us show how we may use it to conclude the results in Figure [TJ 
To do this, we must compute L and £(s) from the conditions and when A is the Fourier 
ensemble (or any bounded orthogonal ensemble), and when A is the partial circulant ensemble. 



3.1 Bounded orthogonal ensembles 

Suppose A is a bounded orthogonal ensemble. The RIP analysis of [18,51 shows 



E sup 



1 

M 



\Ax\ 



x 



< 



k log k log d 



M 



provided that M > k log 3 k log d, so we may take L < k log 3 k log d. Further, the analysis of [51 
(see Lemma [T7|) implies that 



3s G [2k] : max sup || A b x\\ 2 > £(s) 
be[m] x( zt s 



< 2km max Pa~.4 
se[2fc] 



sup 1 1 j4ia? 1 1 2 > 

x&T s 



< 1/8 



when 



£(s) ~ log 1/4 (m)\/B + log 1/4 (mWslog 2 (/fc)log(d)log( J B). 
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Thus, we may take Qi < log 1/4 m and Q 2 < log 1/4 (m) log(k)y/log(d) log(B) < log 2 - 5 ((i) With these 
parameter settings, Theorem [6] implies the following theorem. 

Theorem 7. Let e £ (0,1). Let A be a bounded orthogonal ensemble (for example, the Fourier 
ensemble), and suppose that $ is as in Definition^ Further suppose B > log 6 5 d andk> log 2 ' 5 m. 
Then for some value 



m 



O 



we have that 



with 3/4 probability. 



sup 

x£T k 



k log d log 2 (k log d) 



\<f>x\ 



< £ 



3.2 Circulant Matrices 

Suppose that A is the partial circulant ensemble. By the analysis in |38j, 



E sup 



1 

M 



\Ax\ 



X 



k log 2 k log 2 d 

M ' 



for M > k log 2 k log 2 d. Concentration also follows from the analysis in [38] , as a corollary of 
Theorem 1101 (see \38\ Theorem 4.1]). 



Lemma 8. (Implicit in J38\/) 
when 



3s G [2k] : max sup ||^4fex|| 2 > £(s) 



1 

< - 



£(s) ~ \[~B + yfs log k log d. 

Thus, we may take Q± < 1 and Q2 ^ log log d. Then Theorem [6] implies the following theorem. 

Theorem 9. Let e G (0, 1). Le£ ^4 &e i/ie partial circulant ensemble, and suppose $ is constructed 
as in Definition^ Further suppose B > log 2 m log 2 log 2 d and k > log 2 m. Then, for some value 

' /c log d log 2 (/c log d) s 



m 



O 



we have that, as long as m < d/B, 



sup 

x&T k 



\<&x\ 



< £ 



with 3/4 probability. 

We remark that the condition m < d/B does not actually effect the results reported in Figure [TJ 
Indeed, if mB > d, we may artificially increase d to d' = mB by embedding T^ in C d by zero- 
padding. Applying Theorem [9] with d = d' implies an RIP matrix with 0(e _2 /clog(i'log 2 (A;logd)) 
rows and embedding time 0(d! log d') + mpolylogd'. Because B = polylogd, we have d' = 
dpolylog(d), and there is no asymptotic loss in m by extending d to d! . Further, in this parameter 
regime, d' log d' = mB log d' = m polylog d. 
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4 Proof of Theorem [6] 

We will use the following theorem from [38]. 

Theorem 10. 1381 Theorem 1.4] Let S C C mX be a symmetric set of matrices, S = —S. Let 
a £ {±1} M uniformly at random. Then 



E sup 



IX0-II2-EIIX0-H2 



{d F (S) l2 (S, 



--■E'. 



Furthermore, for all t > 0, 



sup 



\Xa\\l -EHX0-H2 



> CiE' + t 



< 2exp ( —C 2 min 



f_ t_ 

V 2 ' [/ 



where C\ and C 2 are constants, 



V = d 2 -> 2 {S){ l2 {S,\\-\\) + d F {S)), 



and 



The first step in proving Theorem [6] is to bound the restricted isometry constant of in terms 
of the 72 functional, removing the dependence on a. 

Theorem 11. Suppose A = Am is a distribution on M x d matrices so that (j*j) holds, and let 
be as in Definition^ Then 



E sup 

xdT k 



mB 



\$x\ 



< 



mB \A xeTk 



Esup \\Ax\\ 2l2 (T k ,\\-\\ x )+E 7 2 2 (T k ,\\-\\ x ) + 



L 
mB 



• (5) 



where 



\ x \\x '■= ma * ||A b x|| 2 



Proof. Let H (b) = { h(b, i) : i £ [B] } be the multiset of indices of the rows of A in bucket b, and 
as above let Ab denote the B x d matrix whose rows are indexed by H{b). Let = Yli=i a b,i^i 
denote the vector of sign flips associated with bucket b. Notice that, by construction, conditioning 
on A ~ A, we have 

E||$a;||2 = \\Ax\\l, (6) 

and so 



E sup 

xdT k 



mB 



1 1 2 1 1 2 

l^ailU — \\x\\ 



<E A 
1 



— !— E CT sup 

mB xeTk 



\§x\\l -E CT \\$x\\l 



+ sup 

x£T k 



E^E^ sup 
mB xGTh 



1 



< 

~ mB 



E^Eo- sup 

x£T k 



\&x\\ 2 — \\AxW2 



l^xllj ~~ ll^ll^ 



+ E^ sup 

x£T k 



-^^a\\^x\\l 

mB z 
mB " 112 11 



+ 



L 

mB ' 



(7) 
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where we have used Q in the last line and Q in the penultimate line. 

Condition on the choice of A until further notice, and consider the first term. We may write 



E := E sup 

a x£T k 



\<f>x\\l-E\\$x\\ 2 2 



E sup 



J2\(°b,A b x)\ 2 -E£|<<t 6 ,A 6 x>| 5 



6 

Now, we apply Theorem [TOl to S = {X(x) € t^mxmB | x ^ Tk), where X(x) is defined as follows: 



X{x) 



-(A 1X ) 







-(A 2X y 
o 



o 
o 

-(A 3X y 



o 






••• -(A m x)*- 

Let (7 be the vector in {—1, 1} M defined as (a*, . . . , cr* m )* . By construction, || JSf(a;)o"[|| = ^2 b \ (a b , A b x)\ 2 , 
and so by Theorem 1101 it suffices to control dp(S) and 72(5, ||-||). The Frobenius norm of X(x) is 

\\X(x)\\ 2 F = £ \\A b x\\l = \\Ax\\l 
6e[m] 



For the 72 term, notice that for any x,y € Tj~, 



\X{x)-X 



max ||j4fe(x 
foe [ml 



\v-V\\x 



Thus, 72(5, ||-||) = 72(7^, ||-||x). Then Theorem [TOl implies that 



E < max||Aa;|| 2 72(r fe , ||-|| x ) + 7 2 (T fc , 



Plugging this into (J7|), we conclude 



E sup 



mB 



-i- (Esup Px|| 2 72(T fc ,||-|| x )+E7 2 2 (T fc ,||-|| x ) I + J 



• (8) 



Theorem 1111 leaves us with the task of controlling 72(1^, which we do in the following 

theorem. 

Theorem 12. Suppose that A is a matrix such that §f§ holds, with 

i(s) < q 1 Vb + q 2 ^. 

Suppose further that Ha^l^ < 1 for all i, and suppose that 

S>max{Q2log m, Q l log m log k}, andk>Q l log m. 

Then 



72 



(T k ,\\-\\ x ) < ^/kBlogd-log(Bk). 



12 



Proof. By ©, 

72(?fc, \\-\\ x ) < / JlogJ\f(T k , ||-|| x ,tt)du, (9) 

Ju=0 V 

where Q = sup^gy \\ x \\x ■ Notice that we can bound 

Q 2 = sup max 1 1 ^4t,a3 1 1 2 = sup max | ^, x) | 2 < -B sup < 

using the fact that each entry of ^ has magnitude at most 1. We follow the approach of |51j 
and estimate the covering number using two nets, one for small u and one for large u. 
For small u, we use a standard £2 net of B2: we have 

1 1 1 1 X — 1 1 *^ 1 1 2 

so N(Tj-, \\'\\x , u ) — N(T k , ||-|| 2 ,u/Q). Observing that T k is the union of (f) = copies of 

i?2 (the unit -^-ball of dimension k), we may cover T k by covering cover each copy of B\ with a 
net of width u/Q. By a standard volume estimate |49l Eqn. (5.7)], the size of each such net is 
(1 + 2Q/u) k , and so 



'logM(T k ,n x ,u) < ^/klog(d/k) + Hog(l + 2Q/u) < ^/k\og{dQ/u). 

For large u the situation is not as simple. We show in Lemma [131 that, as long as ([[]) holds, 

^JkB log d 



logM(T k ,\\-\\ x ,u)< 



u 



We plug these bounds into ([9]) and integrate, using the first net for u £ (0, 1) and the second 
for u > 1. We find 



/ /logAA(T fe ,max||F b - \\,u)du < / log(dQ/u) du + / 

iu=0 V b J u=0 J u= 



VkB logd , 
du 



11 



< Vklog(dQ) + VkB log dlogQ 

< VkB logd log Q 

< VkB log dlog(Bk) 

as claimed. 

It remains to put Theorem [TT] and Theorem [12] together to prove Theorem [6j 
Proof. (Proof of Theorem O) We need to show that 



A := sup 

xdT k 



1 "ft II 2 || 1 
9X o — \\X\ 



mB" ~" 2 



< 



with 3/4 probability. We have by that ([[]) holds with 7/8 probability over .A, and we will 
show that A < e with 7/8 probability when A is drawn from the distribution A! = (A | ([f]) holds). 
Together, this will imply the conclusion of Theorem [6l 
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Note that as long as (jjj) holds for A, (jlj) holds for A' as well. Indeed, 



E sup 

A~A' xeTk 



-Ax o 



mB 



< 



E sup 

7 J A~A xeTk 



-\\Ax\\l 



mB 



< 



so (j±j) holds for A'. For the rest of the proof, we consider A ~ A , so we have 

-^=E sup ||Ac|| 2 < y/l + 0(e)<l. 

Under the parameters of Theorem [6] and because ([[]) holds for all A ~ A', Theorem (112p implies 

72 (T fc ,|H| x ) < kB log d-log(Bk). 



Then 



1 



E 
mB A 



sup pa;|| 2 -7 2 (r fe , 



^fclog(d).log(BA;) 



< e. 



/// 



Similarly, 



1 



mi? A 

By Theorem [TTl and using the above bounds 

1 



fclog(d)log 2 (ijfc) 2 



m 



E[A] <— Esup Px|| 2 72(T fc ,||-|| x )+E 7 ^(T, 



+ 



L 

mB 



yA xeTk 

<e + e 2 + e 
<e. 

Therefore by Markov's inequality, we have A < e with arbitrarily high constant probability over 
A ~ A'. In particular, we may adjust the constants so that A < e with probability at least 7/8 
over A ~ A', which was our goal. ■ 



5 Covering number bound 

In this section, we prove the covering number lemma needed for the proof of Theorem [T2j Recall 
the definition \\x\\ x = max^^j ||j4.6x|| 2 , and that is the set of fc-sparse vectors in C d with £2 
norm at most 1. 

Lemma 13. Suppose that the conditions of Theorem\T^hold. Then 

M(T k /Vk,\\-\\ x ,u) < (2d+l)°( B /" 2 ). 

We will prove this under the assumption that x G is real, using only that ([f]) holds for 
s < k and that A has bounded entries. Then by Proposition [16] in the Appendix, we have 
A/"(Tfe/vfc, \\'\\x i u ) over the complex numbers is less than A^T^ /V2A;, \\-\\x 1 u ) over the reals, 
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where \\-\\v denotes a version of the \\-\\ x f° r a matrix A of bounded entries that satisfies ([[]) for 
s < 2k. Adjusting the constants by a factor of 2 gives the final result. 

As in [51] . we use Maurey's empirical method (see [E]). Consider x £ T^/Vk, and choose a 
parameter s. For i 6 [s], define a random variable Zi, so that = ejsign(xj) with probability 

for all j G [d], and with probability 1 — ||x||i. Notice that by the assumption that x is real, 
sign(xj) is well defined. Further, because T^/Vk C B\, this is a valid probability distribution. We 
want to show for every x that 



E 



1 \ - 

* ^ — ' 



(10) 



x 



This would imply that the right hand side is at most u for s < B/u 2 . If this holds, then the set of 
all possible - ^2 Zi forms a u-covering. As there are only 2d + 1 choices for each Zi, there are only 
(2d + l) s different vectors of the form - Yli=i These form a u-covering, so Eq. (fT0|) will imply 

N{T k) \\.\\ x ,u)<(2d + l) oi ~ B l u2 \ 
We now show Eq. (|10p . Draw a Gaussian vector g ~ N(0,I s ), and define 



G(x) = K\\52 Z i9i 



A 



By a standard symmetrization argument followed by a comparison principle (Lemma 6.3 and 
Eq. (4.8) in [30] respectively, or the proof of Lemma 3.9 in |51j). 



E 



g(x) 



A' 



so it suffices to bound ^(x) by 0(s/Bs). 

Let L = : > lo |, m } be the set of coordinates of x with "large" value in magnitude. Then 

G(x) <G(x L ) + g(x T ) 

by partitioning the Zi into those from L and those from L and applying the triangle inequality. 
Notice that xl is "spiky" and xj^ is "flat:" more precisely, we have 



1 



log m 



and 



\xt\\ < 

I L, II oo — 



logm 
_ k~ : 



(11) 



using Cauchy-Schwarz to bound the l\ norm. To bound Q(xl) and Qix-j) we use the following 
lemma. 



Lemma 14. Suppose that ([[]) holds. Then the following inequalities hold for all x: 
Q(x) < y ' Bs Hxl^logm 

^(x) < \f~Bs~ + v / logm (QiV^ + Q2V /mm (^) s )) \AlMloo + log A; 



(12) 
(13) 
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Proof. Let Z e {-1, 0, l} dxs have columns Z 4) and Z = £V Z f . Then 

<5(x) = E max ||^4j,Zg|| 2 

fe€ [m] 

Consider ||A(,Zg|| 2 for a single 6 G [m]. This is a C-Lipschitz function of a Gaussian for C 
\\A b Z\\ 2 ^ 2 . Therefore [Hfl Eq. (1.4)], 

P g [||AZ5|| 2 >Ep 6 Z 5 || 2 + t||A 6 Z|| 2 ^ 2 ] < e"^* 2 ). 
Hence by a standard computation for subgaussian random variables |401 Eq. (3.13)]), 



Q(x) < E maxE H^fcZgHg + ydogm \\A b Z\\ 2 ^ 2 . 

z be [ml g 



Now, 



and 



Thus 



E || A b Zg\\ 2 < /E \\A b Zg\\ z 2 = \\A b Z\\ F = yj B \\Z\\ X (14) 



EySHZll! < jB'EWZWi = ^Bs\\x\\ x < VBs. (15) 



Q(x) < JBs \\x\lj + O ( Emax y/hg m \\ A b Z\\ 2 ^ 2 ] . 

" \ Z fee [m] J 



(16) 



Thus it suffices to bound 1 1 ^4^, Z 1 1 2 ^ 2 in terms of \\x\\i and ||a;|| . First, we have 

||^-6Z|| 2 _ >2 < ||^4feZ|| F 
and so by Equations (fT4"|) and (fTS"j) we have 



Q{x) < \JBs \\x\\ x logm, 

as desired for Equation (fT2j) . 

Second, we turn to Equation (|13|) . For a matrix A £ m x d and a set S C [cf|, let A| 5 denote 
the m x d matrix with all the columns not indexed by S set to zero. Then, we have 

\\A b Z\\ 2 ^ 2 < A b \ (z) ||Z|| 2 _, 2 < max || A b \ s \\ 2 ^ 2 \\ z \& 2 ■ ( 17 ) 

y ' 2-^2 |6|<mm(fc,s) 



In the final step, we used the fact that for any matrix A, \\A\\2->2 < ||^4||oo— s>oo (see 
Lemma [15] in the Appendix). By the assumption ([f]) and the choice of I, 

max sup ||.A&:e|| 2 < QiV~B + Q2\/min(k, s), 
be[m] xeT min{M) 

so 

max ||yl b Z|| 2 ^ 2 < ||Z||^ 2 (QiVb + Q 2 y/mm(k, s) ) . 
fee [m] V / 
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Finally, we bound Kz . By a Chernoff bound, for any j £ supp(x), we have 



> s \Xj \ + t 



< e -n(t). 



Integrating, we have 
Thus 



EIIZIL < s llxll + log fc. 



E max || At,Z\\ 2 _^ 2 < (s + log k) 1/2 (QiVs + Q 2 Vmin(fc,s) 

Z [ml V 



Combining this with Equation (|16|) gives (|13p . 

We return to the proof of Lemma [T3l Recall that the goal was to bound 

G(x L ) + Q{x T ) < VB~s. 
By (JIT!) and jl2j), ^(scl) < v^Bs. Furthermore, 



^(x r ) < \/fis + y/\ogm [QiVb + Q 2 y / mm(k,s)J ysll^illoo + log A; 

< v^Bl + Vlogm (QiVB + Q 2V / min(A;,s)) ^ sl °g m + 0oifc 



% 1 + Qi 



log m j log m log k 

Vk 



-y/min(fe, s) log m /log m log min(A;, s 



+ 



D 



Since we have assumed B > Q\ log 2 m, the Q2 term is bounded by a constant. Further, fe > 



log m, and s> B > Q\ logm log /c, and so the Qi term is also constant. Thus, we conclude 

Q{x) <G{x L ) + Q{x T ) < VB~s, 



which was our goal. 



6 Conclusion 

In compressed sensing, it is of interest to obtain RIP matrices $ supporting fast (i.e. nearly linear 
time) matrix-vector multiplication, with as few rows as possible. Not only does fast multiplication 
reduce the amount of time it takes to collect measurements, it also speeds up many iterative 
recovery algorithms, which are based on repeatedly multiplying by <3? or <J>*. Similarly, because of 
applications in computational geometry, numerical linear algebra, and others, one wants to obtain 
JL matrices with few rows and fast matrix-vector multiplication. In this work, we have shown how 
to construct RIP matrices supporting fast matrix-vector multiplication, with fewer rows than was 
previously known. Combined with the work of [39], this also implies improved constructions of fast 
JL matrices. 

Our work leaves the obvious open question of removing the two 0(log(k log d)) factors separating 
our constructions from the lower bounds. It seems that both logarithmic factors come from the 
estimation ([3]). It would be interesting to see if they could be removed by more sophisticated 
chaining techniques such as majorizing measures. 
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Appendix 



2 

Lemma 15. For any complex matrix A, ||A|| 2 _ S>2 < \\A\\^ • ||^.|| 00 _ s . 00 - 

Proof. First we consider the case of Hermitian A, then arbitrary A. For Hermitian A, let A be 
the largest (in magnitude) eigenvalue of A and v be the associated eigenvector. We have 

I, A I, ll^Mll ll^lll 1,1 II A II 

Mll^i > \J± = ^J 1 = |A| = P|| 2 _+ 2 . 
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For arbitrary A, 

II AW 2 -114*411 < 114*411 < 114*11 .11411 -II4II .11411 

as desired. In the last inequality we used the fact that || • ||oo->oo is equal to the largest l\ norm of 
any row, and || • ||i-s.i is equal to the largest l\ norm of any column. ■ 

Proposition 16. Let f : C d — > M. 2d act entrywise by replacing a + bi with (a,b). For any integer 
r, define F : C rxd — > ]& 2rx2d to act entrywise by replacing an entry a + bi by the 2x2 matrix 

a —b 
b a 

Recall that T k C C d is the set of unit norm k-sparse complex vectors, and let S s C W be the set of 
unit norm s -sparse real vectors. Recall that \\-\\ x ^ s a norm on C given by \\x\\ x = max;, ||v4f,x|| 2 , 
and let \\-\\ x be a norm onM. 2d given by \\x\\ x = msoq } \\F(Af,)x\\ 2 . Then 

1- If holds, then max;, sup xg s s ||.F(^hj)x|| 2 ^ ^( s ) f or s ^ 2A;. 



2. With \\-\\ x as above, we have 

JV(T fe , \\-\\ x ,u) < M(S 2 k, \\-\\ x ,u). 

Proof. By construction, we have f{Ax) = F(A)f(x), and also ||/(x)|| 2 = \\x\\ 2 . Further, /(!*.) C 
S 2 k and f~ 1 (S s ) C T s . Thus, item [T] follows because 

max sup H-F^ft)^^ < max sup ||F( J 4fe)/(y)|| 2 = max sup ||-A&y|| 2 < l(s) 

b xeS B b y€T s b y€T s 

Similarly, item [2] follows because for any x,y € T^, 

\\ x - y\\x = ma x \\A{x - y)\\ 2 

be[m\ 

= max\\F(A h )f(x-y)\\ 2 

be[m] 

= max\\F(A b )(f(x)-f(y))\\ 2 
= \\f(x)-f(y)\\ x . 

Hence 

N(T k , \\-\\ x ,u) = N(f(T k ), \\-\\ x ,u) < N(S 2k , \\-\\ x ,u). 



Lemma 17. Let T denote the dx d Fourier matrix. Let £1 with = B be a random multiset with 
elements in [d], and for S C [d] let J^nxS denote the |0| x |5| matrix whose rows are the rows of T 
in £1, restricted to the columns in S. Then for any t > 1, 



max||Jnxs|| < y/t{B + k0j 
\S\=k 
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with probability at least 

1 - O (exp (-min{t 2 ,t/3})) , 

where 

(3 = log 2 k log d log B. 

Proof. (Implicit in [ST]). Let X = sup| 5 | =fc — j-^xs-^bxs || > where is the k x k identity 
matrix. It is shown in 1511 that 



x s /Mog^iogB (E x + 1} =: /| (EX + 1) . 



This implies that 



1 + £M 



EX < H =: a- ( 18 ) 

Indeed, whenever x 2 < A(x + 1), we have x < A + 1 or else we conclude (A + l) 2 < A 2 + 2A Let 
a denote the right hand side of (|18p . We may plug this expectation into the proof of Theorem 3.9 
in [51], and we obtain 



F[X > Cta] < 3exp(-C'taB/k) + 2exp(-i 2 ) 

for constants C and C". In the case X < Cta, we have 

max ILFnxsll < yjBll + Cta) < VB + ^BCta, 
\S\=k 

and so we conclude that 



max||Jnx.s|| < y/B + O (y/t(B + hp) 
\S\=k V 

with probability at least 

1 - 3exp(-C't(/3 + B/k)) - 2exp(-t 2 ). 
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